Friday 18 sep, 2009


Discussion on paper :
  • S. Softic and M. Hausenblas, “Towards Opinion Mining Through Tracing Discussions on the Web,” Social Data on the Web (SDoW 2008) Workshop at the 7th International Semantic Web Conference, Karlsruhe, Germany: 2008 (Download Paper)

Abstract
  • This paper reports on our ongoing work regarding opinion mining from Web-based discussion forums in the realm of the Understanding Advertising (UAd) project. Our approach to opinion mining is to first RDFise discussion forums in SIOC, and in a second phase to interlink the so created data with linked datasets such as DBpedia. We are confident that this should allow a market researcher to formulate queries using domain semantics and hence understand what people think about a certain product or service. The system’s architecture, preliminary results, and the current available demonstrator are discussed in this work.

Summary
Three methods are developed to convert plain (HTML) Web content into structured data represented in RDF:
  • Plain old screen scraping (in the so called UAd Harvester/Mapper module)
  • Pattern-based RDFising and Interlinking for online discussions (the UAd Discussion Tracer)
  • Schema-based a-priori RDFising and Interlinking (for statistical data from Eurostat)

Relevant Tools/Websites
  • Semantically-Interlinked Online Communities (SIOC): The SIOC initiative (Semantically-Interlinked Online Communities) aims to enable the integration of online community information. SIOC provides a Semantic Web ontology for representing rich data from the Social Web in RDF.
  • DBPedia: DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows users to ask expressive queries against Wikipedia and to interlink other datasets on the Web with DBpedia data.
  • Linked Open Data Open (LOD) Community Project: Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."
  • SentiWordNet: SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity. SentiWordNet is described in details in the paper //SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining//


Things need to be Explored in UAd project:
UAd Aquisition
  • Project use widely deployed vocabularies—e.g. Semantically-Interlinked Online Communities (SIOC)—along with existing APIs for the extraction and structuring phase.
  • Project aim at orienting the opinion holder context on domain semantics along with exploiting linked datasets (such as DBpedia ) and domain delimited query expansion. The interlinking with DBpedia is done manually.<
  • Implemented a client/server system to perform the data acquisition in UAd. The server has been implemented using a Java application server (Tomcat) along with a Jena 2/PostgreSQL RDF store taking care of the scheduling and execution of the acquisition tasks.At the client side, a Firefox plug-in allows a user to define, control and monitor the tracing tasks. The plug-in has been developed in JavaScript and XUL (http://developer.mozilla.org/en/docs/XUL).

UAd Analyser
  • The UAd Analyser is aWeb Application allowing a market researcher to examine the data gathered by the UAd Acquistion Server.(implemented with the Google Web Toolkit) http://code.google.com/webtoolkit/)

Note:
  • The W3C has recently launched the “Product Modelling Incubator Group“ aiming at creating a product modelling ontology.
  • The Linking Open Data (LOD) community project is an open, collaborative effort applying the linked data principles. It aims at bootstrapping the Web of Data by publishing datasets in RDF on the Web and creating large numbers of links between these datasets. The datasets included in the project are diverse in both nature and size. Currently, the project includes some 30 different datasets, ranging from rather centralized ones (such as DBpedia) to those that are very distributed (for example the FOAF-o-sphere).